Search Video by Text¶
- Tutorial Difficulty: ★★☆☆☆
- 7 min read
- Languages: SQL (100%)
- File location: tutorial_en/thanosql_search/search_video_by_text.ipynb
- References: Kinetics-700, X-CLIP
Tutorial Introduction¶
Understanding Multi-modal Learning
Multi-modal refers to an environment in which various forms of information are communicated, where modality refers to data types. In the case of machine learning using multi-modal data, it enables an integrated analysis since it effectively learns from various forms of data such as image data, text data, and sensor data.
OpenAI's CLIP is a image-text multimodal deep learning model specialized in understanding text and images together.
The following are examples and applications of the ThanoSQL text-video search algorithm.
- Use text descriptions to search from your own videos to return videos containing the scenes you want.
- Search for the scene you want using text from YouTube videos and so on.
In This Tutorial
👉 This tutorial uses the kinetics700-2020 dataset. Kinetics is a large image dataset of human behavior released by DeepMind. Kinetics 700-2020 is a new version of the Kinetics dataset which was released in 2020 and includes images of 700 classes.
The ThanoSQL's X-CLIP model is a pre-built model that extends the existing image-text multimodal CLIP model to understand the relationship between video and text. In this tutorial, we'll use a model that inputs text to search for videos from within the ThanoSQL workspace database.
0. Prepare Dataset & Model¶
As mentioned in the ThanoSQL Workspace, you must create an API token and run the query below to execute the query of ThanoSQL.
%load_ext thanosql
%thanosql API_TOKEN=<Issued_API_TOKEN>
Prepare Dataset¶
%%thanosql
GET THANOSQL DATASET kinetics700_data
OPTIONS (overwrite=True)
Success
Query Details
- "GET THANOSQL DATASET" downloads the specified dataset to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL DATASET clause.
- "overwrite": determines whether to overwrite a dataset if it already exists. If set as True, the old dataset is replaced with the new dataset (bool, optional, True|False, default: False)
%%thanosql
COPY kinetics700
OPTIONS (overwrite=True)
FROM 'thanosql-dataset/kinetics700_data/kinetics700.csv'
Success
Query Details
- "COPY" specifies the name of the dataset to be saved as a database table.
- "OPTIONS" specifies the option values to be used for the COPY clause.
- "overwrite": determines whether to overwrite a table if it already exists. If set as True, the old table is replaced with the new table (bool, optional, True|False, default: False)
Prepare the Model¶
%%thanosql
GET THANOSQL MODEL xclip
OPTIONS (
model_name='tutorial_search_xclip',
overwrite=True
)
Success
Query Details
- "GET THANOSQL MODEL" downloads the specified model to the workspace.
- "OPTIONS" specifies the option values to be used for the GET THANOSQL MODEL clause.
- "model_name": the model name to store a given model in the ThanoSQL workspace (str, optional)
- "overwrite": determines whether to overwrite a model if it already exists. If set as True, the old model is replaced with the new model (bool, optional, True|False, default: False)
1. Check Dataset¶
For this tutorial, we use the kinetics700 table located in the ThanoSQL workspace database. Run the query below to check the contents of the table.
%%thanosql
SELECT *
FROM kinetics700
LIMIT 5
| video_path | label | duration | |
|---|---|---|---|
| 0 | thanosql-dataset/kinetics700_data/video/-dhP2A... | checking tires | 10 |
| 1 | thanosql-dataset/kinetics700_data/video/1ejgHK... | testifying | 10 |
| 2 | thanosql-dataset/kinetics700_data/video/2Yvab3... | checking tires | 10 |
| 3 | thanosql-dataset/kinetics700_data/video/3nFLLc... | punching person (boxing) | 10 |
| 4 | thanosql-dataset/kinetics700_data/video/5PfhCJ... | kitesurfing | 10 |
Understanding the Data Table
The kinetics700 table contains the following information.
- video_path: video path
- label: video label
- duration: video time
To output the videos from the table, use the "PRINT" query statement.
%%thanosql
PRINT VIDEO
AS
SELECT video_path
FROM kinetics700
LIMIT 2
/home/jovyan/thanosql-dataset/kinetics700_data/video/-dhP2AH0eqI.mp4
/home/jovyan/thanosql-dataset/kinetics700_data/video/1ejgHKw8E3Y.mp4
2. Convert Using a Pre-built Model¶
To vectorize the kinetics700 videos, run the "CONVERT USING" query. The vectorized results are stored in a user-defined column(default: 'convert_result') in the kinetics700 table.
%%thanosql
CONVERT USING tutorial_search_xclip
OPTIONS (
video_col='video_path',
table_name='kinetics700',
result_col='convert_result'
)
AS
SELECT *
FROM kinetics700
| video_path | label | duration | convert_result | |
|---|---|---|---|---|
| 0 | thanosql-dataset/kinetics700_data/video/-dhP2A... | checking tires | 10 | [0.12786624, -0.32795882, 0.7120372, -0.202227... |
| 1 | thanosql-dataset/kinetics700_data/video/1ejgHK... | testifying | 10 | [0.09305692, -0.6363036, 1.1664735, -1.3211734... |
| 2 | thanosql-dataset/kinetics700_data/video/2Yvab3... | checking tires | 10 | [-0.36444482, -1.3532777, 0.6840816, 0.1075817... |
| 3 | thanosql-dataset/kinetics700_data/video/3nFLLc... | punching person (boxing) | 10 | [1.0626085, 0.24401952, -1.1193922, -0.450266,... |
| 4 | thanosql-dataset/kinetics700_data/video/5PfhCJ... | kitesurfing | 10 | [1.3834243, -0.16915198, 0.64251024, -0.636115... |
| ... | ... | ... | ... | ... |
| 93 | thanosql-dataset/kinetics700_data/video/wwgl_8... | land sailing | 10 | [1.2699885, -0.7124895, -0.012968205, -0.44243... |
| 94 | thanosql-dataset/kinetics700_data/video/xICkLB... | cutting nails | 10 | [-0.37504548, 1.011268, 0.08616501, -1.0632092... |
| 95 | thanosql-dataset/kinetics700_data/video/xlRC0n... | testifying | 10 | [0.028160986, -0.86515856, 1.3626868, -0.28014... |
| 96 | thanosql-dataset/kinetics700_data/video/yyy2Vy... | bench pressing | 10 | [0.7977668, -0.06954476, 0.52593017, 0.4088737... |
| 97 | thanosql-dataset/kinetics700_data/video/zb9HGN... | country line dancing | 10 | [0.9336661, -1.0631702, -0.50537837, -0.295630... |
98 rows × 4 columns
Query Details
- "CONVERT USING" uses tutorial_search_xclip as an algorithm for video vectorizaion.
- "OPTIONS" specifies the options to be used for text vectorization.
- "table_name": the table name to be stored in the ThanoSQL workspace database. If a previously used table is specified, the existing table will be replaced by the new table with a 'convert_result' column. If not specified, the result dataframe will not be saved as a data table (str, optional)
- "video_col": the name of the column containing the video path (str, default: 'video_path')
- "result_col": defines the column name that contains the vectorized results (str, optional, default: 'convert_result')
3. Search¶
Perform a text-based video search using the "SEARCH VIDEO" query statement and the tutorial_search_xclip model. Execute the following query with the text value "bench press" and the embedded kinetics700 videos to calculate the similarity.
%%thanosql
SELECT video_path, label, score
FROM (
SEARCH VIDEO
USING tutorial_search_xclip
OPTIONS (
search_by='text',
search_input='bench press',
emb_col='convert_result',
result_col='score'
)
AS
SELECT *
FROM kinetics700
)
ORDER BY score DESC
LIMIT 10
| video_path | label | score | |
|---|---|---|---|
| 0 | thanosql-dataset/kinetics700_data/video/qNB9qv... | bench pressing | 0.301461 |
| 1 | thanosql-dataset/kinetics700_data/video/yyy2Vy... | bench pressing | 0.279862 |
| 2 | thanosql-dataset/kinetics700_data/video/s0uI9I... | giving or receiving award | 0.225559 |
| 3 | thanosql-dataset/kinetics700_data/video/PmuHnz... | sharpening pencil | 0.198754 |
| 4 | thanosql-dataset/kinetics700_data/video/AfKqHI... | parasailing | 0.197273 |
| 5 | thanosql-dataset/kinetics700_data/video/6MWLkJ... | kitesurfing | 0.195590 |
| 6 | thanosql-dataset/kinetics700_data/video/a9S4Ox... | golf chipping | 0.194719 |
| 7 | thanosql-dataset/kinetics700_data/video/5PfhCJ... | kitesurfing | 0.194065 |
| 8 | thanosql-dataset/kinetics700_data/video/xlRC0n... | testifying | 0.185565 |
| 9 | thanosql-dataset/kinetics700_data/video/zb9HGN... | country line dancing | 0.183363 |
Query Details
- "SEARCH VIDEO" searches for videos. Input the text description of the video using the "text" variable.
- "USING" specifies tutorial_search_xclip as the model.
- "OPTIONS" specifies the option values required for video vectorization.
- "search_by": defines the image|text|audio|video type to be used for the search (str)
- "search_input": defines the input to be used for the search (str)
- "emb_col": the column that contains the vectorized results (str)
- "result_col": defines the name of the column that contains the search results (str, optional, default: 'search_result')
- "AS" defines the embedding table to be used for search. In this example, the kinetics700 table is used.
%%thanosql
PRINT VIDEO
AS (
SELECT video_path
FROM (
SEARCH VIDEO
USING tutorial_search_xclip
OPTIONS (
search_by='text',
search_input='bench press',
emb_col='convert_result',
result_col='score'
)
AS
SELECT *
FROM kinetics700
)
ORDER BY score DESC
LIMIT 2
)
/home/jovyan/thanosql-dataset/kinetics700_data/video/qNB9qv6PqwI.mp4
/home/jovyan/thanosql-dataset/kinetics700_data/video/yyy2Vy_5DjI.mp4
4. In Conclusion¶
In this tutorial, we searched for videos in the kinetics700 dataset by text using a multi-modal text/video vectorization model. As this is a beginner-level tutorial, we focused on the process and showing visible results rather than accuracy. The video search can retrieve more accurate results by utilizing various queries.
- How to Upload My Data to the ThanoSQL Workspace
- How to Create a Table Using My Data
- How to Upload My Model to the ThanoSQL Workspace
Inquiries About Deploying a Model for Your Own Service
If you have any difficulties creating your own model using ThanoSQL or applying it to your services, please feel free to contact us below😊
For inquiries regarding building a text-video search models: contact@smartmind.team